Clustering and classification

Aarthi Ravindran.20/11/2019 - RStudio Exercise 4: Tasks and Instructions Note: for exercise 5 data wraningling please check the below link [https://github.com/AarthiRavindran/IODS-project/blob/master/create_human.R] Analysis Exercises - 15 marks

  1. Create chapter4.Rmd file and include the child file in ‘index.Rmd’ file.
  2. Load the Boston data from the MASS package and explore the data. (0-1 points)
  3. Show a graphical overview of the data and show summaries of the variables in the data. Describe and interpret the outputs, commenting on the distributions of the variables and the relationships between them. (0-2 points)
#load Boston data from Mass package (Exercise 4.2)

#The following R packages are installed already, which are mandatory to run this analysis
#install.packages("MASS")
#install.packages("corrr")
library(MASS)
library(tidyr)
library(corrr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:MASS':
## 
##     select
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
#install.packages("corrplot")
library(corrplot)
## corrplot 0.84 loaded
#if(!require(devtools)) install.packages("devtools")
#devtools::install_github("ggobi/ggally")
#install.packages("GGally")
library(GGally)
## Loading required package: ggplot2
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
## 
## Attaching package: 'GGally'
## The following object is masked from 'package:dplyr':
## 
##     nasa
data(Boston)

#Explore the Boston data (Exercise 4.2)
str(Boston)
## 'data.frame':    506 obs. of  14 variables:
##  $ crim   : num  0.00632 0.02731 0.02729 0.03237 0.06905 ...
##  $ zn     : num  18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
##  $ indus  : num  2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
##  $ chas   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ nox    : num  0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
##  $ rm     : num  6.58 6.42 7.18 7 7.15 ...
##  $ age    : num  65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
##  $ dis    : num  4.09 4.97 4.97 6.06 6.06 ...
##  $ rad    : int  1 2 2 3 3 3 5 5 5 5 ...
##  $ tax    : num  296 242 242 222 222 222 311 311 311 311 ...
##  $ ptratio: num  15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
##  $ black  : num  397 397 393 395 397 ...
##  $ lstat  : num  4.98 9.14 4.03 2.94 5.33 ...
##  $ medv   : num  24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
dim(Boston)
## [1] 506  14
summary(Boston)
##       crim                zn             indus            chas        
##  Min.   : 0.00632   Min.   :  0.00   Min.   : 0.46   Min.   :0.00000  
##  1st Qu.: 0.08204   1st Qu.:  0.00   1st Qu.: 5.19   1st Qu.:0.00000  
##  Median : 0.25651   Median :  0.00   Median : 9.69   Median :0.00000  
##  Mean   : 3.61352   Mean   : 11.36   Mean   :11.14   Mean   :0.06917  
##  3rd Qu.: 3.67708   3rd Qu.: 12.50   3rd Qu.:18.10   3rd Qu.:0.00000  
##  Max.   :88.97620   Max.   :100.00   Max.   :27.74   Max.   :1.00000  
##       nox               rm             age              dis        
##  Min.   :0.3850   Min.   :3.561   Min.   :  2.90   Min.   : 1.130  
##  1st Qu.:0.4490   1st Qu.:5.886   1st Qu.: 45.02   1st Qu.: 2.100  
##  Median :0.5380   Median :6.208   Median : 77.50   Median : 3.207  
##  Mean   :0.5547   Mean   :6.285   Mean   : 68.57   Mean   : 3.795  
##  3rd Qu.:0.6240   3rd Qu.:6.623   3rd Qu.: 94.08   3rd Qu.: 5.188  
##  Max.   :0.8710   Max.   :8.780   Max.   :100.00   Max.   :12.127  
##       rad              tax           ptratio          black       
##  Min.   : 1.000   Min.   :187.0   Min.   :12.60   Min.   :  0.32  
##  1st Qu.: 4.000   1st Qu.:279.0   1st Qu.:17.40   1st Qu.:375.38  
##  Median : 5.000   Median :330.0   Median :19.05   Median :391.44  
##  Mean   : 9.549   Mean   :408.2   Mean   :18.46   Mean   :356.67  
##  3rd Qu.:24.000   3rd Qu.:666.0   3rd Qu.:20.20   3rd Qu.:396.23  
##  Max.   :24.000   Max.   :711.0   Max.   :22.00   Max.   :396.90  
##      lstat            medv      
##  Min.   : 1.73   Min.   : 5.00  
##  1st Qu.: 6.95   1st Qu.:17.02  
##  Median :11.36   Median :21.20  
##  Mean   :12.65   Mean   :22.53  
##  3rd Qu.:16.95   3rd Qu.:25.00  
##  Max.   :37.97   Max.   :50.00
#Graphical Overview of data (Exercise 4.3)
pairs(Boston)

#Calculate the correlation of variables in the matrix
cor_matrix <- cor(Boston)%>%round(digits = 2)
#print correlation matrix
cor_matrix
##          crim    zn indus  chas   nox    rm   age   dis   rad   tax
## crim     1.00 -0.20  0.41 -0.06  0.42 -0.22  0.35 -0.38  0.63  0.58
## zn      -0.20  1.00 -0.53 -0.04 -0.52  0.31 -0.57  0.66 -0.31 -0.31
## indus    0.41 -0.53  1.00  0.06  0.76 -0.39  0.64 -0.71  0.60  0.72
## chas    -0.06 -0.04  0.06  1.00  0.09  0.09  0.09 -0.10 -0.01 -0.04
## nox      0.42 -0.52  0.76  0.09  1.00 -0.30  0.73 -0.77  0.61  0.67
## rm      -0.22  0.31 -0.39  0.09 -0.30  1.00 -0.24  0.21 -0.21 -0.29
## age      0.35 -0.57  0.64  0.09  0.73 -0.24  1.00 -0.75  0.46  0.51
## dis     -0.38  0.66 -0.71 -0.10 -0.77  0.21 -0.75  1.00 -0.49 -0.53
## rad      0.63 -0.31  0.60 -0.01  0.61 -0.21  0.46 -0.49  1.00  0.91
## tax      0.58 -0.31  0.72 -0.04  0.67 -0.29  0.51 -0.53  0.91  1.00
## ptratio  0.29 -0.39  0.38 -0.12  0.19 -0.36  0.26 -0.23  0.46  0.46
## black   -0.39  0.18 -0.36  0.05 -0.38  0.13 -0.27  0.29 -0.44 -0.44
## lstat    0.46 -0.41  0.60 -0.05  0.59 -0.61  0.60 -0.50  0.49  0.54
## medv    -0.39  0.36 -0.48  0.18 -0.43  0.70 -0.38  0.25 -0.38 -0.47
##         ptratio black lstat  medv
## crim       0.29 -0.39  0.46 -0.39
## zn        -0.39  0.18 -0.41  0.36
## indus      0.38 -0.36  0.60 -0.48
## chas      -0.12  0.05 -0.05  0.18
## nox        0.19 -0.38  0.59 -0.43
## rm        -0.36  0.13 -0.61  0.70
## age        0.26 -0.27  0.60 -0.38
## dis       -0.23  0.29 -0.50  0.25
## rad        0.46 -0.44  0.49 -0.38
## tax        0.46 -0.44  0.54 -0.47
## ptratio    1.00 -0.18  0.37 -0.51
## black     -0.18  1.00 -0.37  0.33
## lstat      0.37 -0.37  1.00 -0.74
## medv      -0.51  0.33 -0.74  1.00
#visualize the correlation matrix
corrplot(cor_matrix, metho="circle", type="upper", cl.pos="b", tl.pos="d", tl.cex = 0.6)

#Another package for correlation matrix
# calculate the correlation matrix and round it
cor_matrix_gg <- ggcorr(Boston, palette = "RdBu", label = TRUE)
cor_matrix_gg

*Description of Boston Data:*
Bostan data is about the housing values in suburbs.This data is having 506 rows and 14 columns (using dim()), the columns are explained below,
crim - per capita crime rate by town, zn - proportion of residential land zoned for lots over 25,000 sq.ft., indus - proportion of non-retail business acres per town, chas - Charles River dummy variable (= 1 if tract bounds river; 0 otherwise), nox - nitrogen oxides concentration (parts per 10 million), rm - average number of rooms per dwelling, age - proportion of owner-occupied units built prior to 1940, dis - weighted mean of distances to five Boston employment centres, rad - index of accessibility to radial highways, tax - full-value property-tax rate per $10,000, ptration - pupil-teacher ratio by town, pratio - pupil-teacher ratio by town, black - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town, lstat - lower status of the population (percent), medv - median value of owner-occupied homes in $1000s.

Boston Data Variables Interpretation from Data Exploration and Graphical Plots:
From str() command we can see all the variables in the files are numeric or integer.
From summary() we can see minimum, maximum, medium, 1st and 3rd quartile of each varibales. For example, the crime rate ranges from minimum of 0.006 and maximum is 88.97, maximum of 27.74 proportion of non-retail busincess acer in boston, each housing has minimum of 3 room numbers to maximum of 8 rooms, the owners had them with different time points from minimum of 2.9 to maximum of 100 and so on.

The relationship between two dataset or variables in one data can be explained by the correlation among them. To define briefly, When two sets of data are strongly linked together we say they have a High Correlation. The word Correlation is made of Co- (meaning “together”), and Relation. Correlation is Positive when the values increase together, and. Correlation is Negative when one value decreases as the other increases.

From graphical plot pairs(), corrplot(), ggcor() which are from the correlation matrix, we can see some of the parameters are correlated either positively or negatively.

For example from the correlation plots we can clearly see,
crim is postively correlated with rad and tax,
zn is negatively correlated with indus, nox, age and postively correlated with dis
nox is postively correlated with age, rad, tax, istat and negatively correlated with dis, medv and so on.

From correlation plot we will be able to see clearly than the pairs, also ggcor cor also gives us the correlation values.
########################################################################################################################

  1. Standardize the dataset and print out summaries of the scaled data. How did the variables change? Create a categorical variable of the crime rate in the Boston dataset (from the scaled crime rate). Use the quantiles as the break points in the categorical variable. Drop the old crime rate variable from the dataset. Divide the dataset to train and test sets, so that 80% of the data belongs to the train set. (0-2 points)
#standardize the dataset
boston_scaled <- scale(Boston) #in scaling the outlier are removed
summary(boston_scaled)
##       crim                 zn               indus        
##  Min.   :-0.419367   Min.   :-0.48724   Min.   :-1.5563  
##  1st Qu.:-0.410563   1st Qu.:-0.48724   1st Qu.:-0.8668  
##  Median :-0.390280   Median :-0.48724   Median :-0.2109  
##  Mean   : 0.000000   Mean   : 0.00000   Mean   : 0.0000  
##  3rd Qu.: 0.007389   3rd Qu.: 0.04872   3rd Qu.: 1.0150  
##  Max.   : 9.924110   Max.   : 3.80047   Max.   : 2.4202  
##       chas              nox                rm               age         
##  Min.   :-0.2723   Min.   :-1.4644   Min.   :-3.8764   Min.   :-2.3331  
##  1st Qu.:-0.2723   1st Qu.:-0.9121   1st Qu.:-0.5681   1st Qu.:-0.8366  
##  Median :-0.2723   Median :-0.1441   Median :-0.1084   Median : 0.3171  
##  Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000  
##  3rd Qu.:-0.2723   3rd Qu.: 0.5981   3rd Qu.: 0.4823   3rd Qu.: 0.9059  
##  Max.   : 3.6648   Max.   : 2.7296   Max.   : 3.5515   Max.   : 1.1164  
##       dis               rad               tax             ptratio       
##  Min.   :-1.2658   Min.   :-0.9819   Min.   :-1.3127   Min.   :-2.7047  
##  1st Qu.:-0.8049   1st Qu.:-0.6373   1st Qu.:-0.7668   1st Qu.:-0.4876  
##  Median :-0.2790   Median :-0.5225   Median :-0.4642   Median : 0.2746  
##  Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000  
##  3rd Qu.: 0.6617   3rd Qu.: 1.6596   3rd Qu.: 1.5294   3rd Qu.: 0.8058  
##  Max.   : 3.9566   Max.   : 1.6596   Max.   : 1.7964   Max.   : 1.6372  
##      black             lstat              medv        
##  Min.   :-3.9033   Min.   :-1.5296   Min.   :-1.9063  
##  1st Qu.: 0.2049   1st Qu.:-0.7986   1st Qu.:-0.5989  
##  Median : 0.3808   Median :-0.1811   Median :-0.1449  
##  Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000  
##  3rd Qu.: 0.4332   3rd Qu.: 0.6024   3rd Qu.: 0.2683  
##  Max.   : 0.4406   Max.   : 3.5453   Max.   : 2.9865
class(boston_scaled)
## [1] "matrix"
boston_scaled <- as.data.frame(boston_scaled)

#Create the catagorical variable of crime rate from scaled data
#create the quantile vector
bins <- quantile(boston_scaled$crim)
bins
##           0%          25%          50%          75%         100% 
## -0.419366929 -0.410563278 -0.390280295  0.007389247  9.924109610
#create the categorical variable of crime
crime <- cut(boston_scaled$crim, breaks = bins, include.lowest = TRUE, labels = c("low", "med_low", "med_high", "high"))

# look at the table of the new factor crime
table(crime)
## crime
##      low  med_low med_high     high 
##      127      126      126      127
# remove original crim from the dataset
boston_scaled <- dplyr::select(boston_scaled, -crim)

# add the new categorical value to scaled data
boston_scaled <- data.frame(boston_scaled, crime)
# boston_scaled is available

# number of rows in the Boston dataset 
n <- nrow(boston_scaled)

# choose randomly 80% of the rows
ind <- sample(n,  size = n * 0.8)

# create train set
train <- boston_scaled[ind,]

# create test set 
test <- boston_scaled[-ind,]

# save the correct classes from test data
correct_classes <- test$crime

# remove the crime variable from test data
test <- dplyr::select(test, -crime)

#Check the structure of the new test data
str(boston_scaled)
## 'data.frame':    506 obs. of  14 variables:
##  $ zn     : num  0.285 -0.487 -0.487 -0.487 -0.487 ...
##  $ indus  : num  -1.287 -0.593 -0.593 -1.306 -1.306 ...
##  $ chas   : num  -0.272 -0.272 -0.272 -0.272 -0.272 ...
##  $ nox    : num  -0.144 -0.74 -0.74 -0.834 -0.834 ...
##  $ rm     : num  0.413 0.194 1.281 1.015 1.227 ...
##  $ age    : num  -0.12 0.367 -0.266 -0.809 -0.511 ...
##  $ dis    : num  0.14 0.557 0.557 1.077 1.077 ...
##  $ rad    : num  -0.982 -0.867 -0.867 -0.752 -0.752 ...
##  $ tax    : num  -0.666 -0.986 -0.986 -1.105 -1.105 ...
##  $ ptratio: num  -1.458 -0.303 -0.303 0.113 0.113 ...
##  $ black  : num  0.441 0.441 0.396 0.416 0.441 ...
##  $ lstat  : num  -1.074 -0.492 -1.208 -1.36 -1.025 ...
##  $ medv   : num  0.16 -0.101 1.323 1.182 1.486 ...
##  $ crime  : Factor w/ 4 levels "low","med_low",..: 1 1 1 1 1 1 2 2 2 2 ...
dim(boston_scaled)
## [1] 506  14

Scaling Results: The scaling of data is important for data analysis, it remove the oulier and scale the data to fit in a modle, in this way we will avoid the bias in data analysis.
AS result of scaling we can see changes in maximum, minimum, 1st, 3rd quartile, mean and median of variables.
For example The max() of crime rate from 88.9 decreased to 9.92.
Then in this data, we know all the variables are numeric/integer, we are changing the crime rate as catagorical variable and split the data into train and test to check the how well the model works. Here 80% of data belong to train set, it is done by taking 80% of rows.
As the result of converting the crime rate into categorical variable, now we can clearly see data into four groups where 127 row in low crime rate, 126 row in medium low crime rate, 126 row in medium high and 127 row in high crime rate. The data is almost divided equally based on the crime rate.
#############################################################################################################################

  1. Fit the linear discriminant analysis on the train set. Use the categorical crime rate as the target variable and all the other variables in the dataset as predictor variables. Draw the LDA (bi)plot. (0-3 points)

Discussion: Before going into Exercise 4.5, we have to check whether the data is having categorical variables. From str() we can clearly see one of the variable “crime” has factor with four levels of low, med_low, med_high, high which is mandatory for Linear Discriminanat Analysis.

Linear Discriminant Analysis:
Linear Discriminant Analysis (LDA) is a dimensionality reduction technique. As the name implies dimensionality reduction techniques reduce the number of dimensions (i.e. variables) in a dataset while retaining as much information as possible. Discriminant analysis is used when groups are known a priori (unlike in cluster analysis). Each case must have a score on one or more quantitative predictor measures, and a score on a group measure.

# linear discriminant analysis
lda.fit <- lda(crime ~ ., data = train) #this train data set has 80% of data

# print the lda.fit object
lda.fit
## Call:
## lda(crime ~ ., data = train)
## 
## Prior probabilities of groups:
##       low   med_low  med_high      high 
## 0.2722772 0.2475248 0.2500000 0.2301980 
## 
## Group means:
##                  zn      indus        chas        nox          rm
## low       1.0631581 -0.9287034 -0.09336997 -0.8943253  0.43854502
## med_low  -0.1536561 -0.2370401  0.04263895 -0.5332870 -0.12448371
## med_high -0.3921463  0.1819976  0.19544522  0.4142127  0.08460485
## high     -0.4872402  1.0173143 -0.10299142  1.0539813 -0.45573081
##                 age        dis        rad        tax     ptratio
## low      -0.9336852  0.8816305 -0.7031069 -0.7154662 -0.46152199
## med_low  -0.2695989  0.2915303 -0.5511961 -0.4726386 -0.06352751
## med_high  0.3850937 -0.3615610 -0.4076377 -0.3064785 -0.22138939
## high      0.8391387 -0.8563401  1.6361396  1.5126335  0.77846144
##               black       lstat       medv
## low       0.3783551 -0.77029159  0.5234763
## med_low   0.3183889 -0.11740037 -0.0191154
## med_high  0.1128226  0.02617773  0.1385354
## high     -0.7701743  0.91445264 -0.7042317
## 
## Coefficients of linear discriminants:
##                 LD1         LD2         LD3
## zn       0.11118999  0.81145644 -0.84572420
## indus    0.02306631 -0.34248144  0.45354325
## chas    -0.09346622 -0.03864378  0.10689715
## nox      0.33615730 -0.62116213 -1.61020728
## rm      -0.06385631 -0.08001899 -0.09521702
## age      0.30482702 -0.30512396 -0.03263238
## dis     -0.08681918 -0.36270378  0.15913564
## rad      3.22925760  0.85263541  0.22159885
## tax     -0.06374789  0.07379666  0.35082305
## ptratio  0.10984603 -0.01055904 -0.42106268
## black   -0.12470366  0.00089685  0.13472357
## lstat    0.25686594 -0.23315451  0.27796512
## medv     0.18269210 -0.43164584 -0.34098584
## 
## Proportion of trace:
##    LD1    LD2    LD3 
## 0.9421 0.0430 0.0149
# the function for lda biplot arrows
lda.arrows <- function(x, myscale = 1, arrow_heads = 0.1, color = "orange", tex = 0.75, choices = c(1,2)){
  heads <- coef(x)
  arrows(x0 = 0, y0 = 0, 
         x1 = myscale * heads[,choices[1]], 
         y1 = myscale * heads[,choices[2]], col=color, length = arrow_heads)
  text(myscale * heads[,choices], labels = row.names(heads), 
       cex = tex, col=color, pos=3)
}

# target classes as numeric
classes <- as.numeric(train$crime)

# plot the lda results
plot(lda.fit, dimen = 2, col = classes, pch = classes)
lda.arrows(lda.fit, myscale = 1)

LD analysis Discussion: From the reduction plot, distribution of data into different clustering based on crime is clearly seen. Some of the med_high values are clustered well with high values than the manual clustering, it would be better to cluster them into high category. Also we can see other variables distribution according to crime rate in the dimensional biplot. For example variable rad is highly correlated with high crime rate.
#############################################################################################################################

  1. Save the crime categories from the test set and then remove the categorical crime variable from the test dataset. Then predict the classes with the LDA model on the test data. Cross tabulate the results with the crime categories from the test set. Comment on the results. (0-3 points)
# predict classes with test data
lda.pred <- predict(lda.fit, newdata = test)

# cross tabulate the results
table(correct = correct_classes, predicted = lda.pred$class)
##           predicted
## correct    low med_low med_high high
##   low        8       7        2    0
##   med_low    4      18        4    0
##   med_high   0       7       17    1
##   high       0       0        0   34
#compare the total number of new corrected one to old corrected one

#old uncorrected count
table(crime)
## crime
##      low  med_low med_high     high 
##      127      126      126      127
#newly corrected count
table(correct_classes)
## correct_classes
##      low  med_low med_high     high 
##       17       26       25       34
#or number in each level in corrected ones
lda.fit$counts
##      low  med_low med_high     high 
##      110      100      101       93

Discussion of predicted model: Manually we divide them based on numeric range, whereas through LDA analysis we can cluster the levels that is best fitted with reference to other variables in dataset. As we can see previously 127 row in low group, whereas 29 of them are corrected and 98 remain in the cluster and so on for each cluster.
#################################################################################################################################

  1. Reload the Boston dataset and standardize the dataset (we did not do this in the Datacamp exercises, but you should scale the variables to get comparable distances). Calculate the distances between the observations. Run k-means algorithm on the dataset. Investigate what is the optimal number of clusters and run the algorithm again. Visualize the clusters (for example with the pairs() or ggpairs() functions, where the clusters are separated with colors) and interpret the results. (0-4 points)
#1. load the Boston data
library(MASS)
data('Boston')

#2. Scale the Boston Data and standardize the dataset
scaled_Boston <- scale(Boston)

#3. calculate the distances between the observations

# euclidean distance matrix
dist_eu <- dist(scaled_Boston)

# look at the summary of the distances
summary(dist_eu)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1343  3.4625  4.8241  4.9111  6.1863 14.3970
# manhattan distance matrix
dist_man <- dist(Boston, method = 'manhattan')

# look at the summary of the distances
summary(dist_man)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##    2.016  149.145  279.505  342.899  509.707 1198.265
#without scaling we get the following answer
# Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#  1.119  85.624 170.539 226.315 371.950 626.047 
#    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
#   2.016  149.145  279.505  342.899  509.707 1198.265 

#4. Run K-means cluster
# k-means clustering
km <-kmeans(scaled_Boston, centers = 4)

# plot the Boston dataset with clusters
pairs(scaled_Boston, col = km$cluster)

Discussion:
From the above kmeans clustering, the data is clustered based on the distance of variables. The pairs plot shows us the distrubution of clusters in each variable. Here i have divied into four center and it is showen in four different colors as seen in the picture. This clustering differ from LD analysis, which was done based on the catagorical variable wheres this is unsupervised clustering based on distance.

Super-Bonus: Run the code below for the (scaled) train data that you used to fit the LDA. The code creates a matrix product, which is a projection of the data points.

model_predictors <- dplyr::select(train, -crime)
# check the dimensions
dim(model_predictors)
## [1] 404  13
dim(lda.fit$scaling)
## [1] 13  3
# matrix multiplication
matrix_product <- as.matrix(model_predictors) %*% lda.fit$scaling
matrix_product <- as.data.frame(matrix_product)
#install.packages("plotly")
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:MASS':
## 
##     select
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
plot_ly(x = matrix_product$LD1, y = matrix_product$LD2, z = matrix_product$LD3, type= 'scatter3d', mode='markers', color = train$crime)